Data skeletons: simultaneous estimation of multiple quantiles for massive streaming datasets with applications to density estimation

نویسندگان

  • James P. McDermott
  • G. Jogesh Babu
  • John C. Liechty
  • Dennis K. J. Lin
چکیده

We consider the problem of density estimation when the data is in the form of a continuous stream with no fixed length. In this setting, implementations of the usual methods of density estimation such as kernel density estimation are problematic. We propose a method of density estimation for massive datasets that is based upon taking the derivative of a smooth curve that has been fit through a set of quantile estimates. To achieve this, a low-storage, singlepass, sequential method is proposed for simultaneous estimation of multiple quantiles for massive datasets that form the basis of this method of density estimation. For comparison, we also consider a sequential kernel density estimator. The proposed methods are shown through simulation study to perform well and to have several distinct advantages over

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exponentially Weighted Simultaneous Estimation of Several Quantiles

In this paper we propose new method for simultaneous generating multiple quantiles corresponding to given probability levels from data streams and massive data sets. This method provides a basis for development of single-pass low-storage quantile estimation algorithms, which differ in complexity, storage requirement and accuracy. We demonstrate that such algorithms may perform well even for hea...

متن کامل

Statistical methodology for massive datasets and model selection

Astronomy is facing a revolution in data collection, storage, analysis, and interpretation of large datasets. The data volumes here are several orders of magnitude larger than what astronomers and statisticians are used to dealing with, and the old methods simply do not work. The National Virtual Observatory (NVO) initiative has recently emerged in recognition of this need and to federate numer...

متن کامل

Estimation of E(Y) from a Population with Known Quantiles

‎In this paper‎, ‎we  consider the problem of  estimating E(Y) based on a simple random sample when at least one of the population quantiles is known‎. ‎We propose a stratified estimator of  E(Y)‎, ‎and show that it is strongly consistent‎. ‎We then establish the asymptotic normality of the suggested estimator‎, ‎and prove that it ...

متن کامل

The Beta-Weibull Logaritmic Distribution: Some Properties and Applications

In this paper, we introduce a new five-parameter distribution with increasing, decreasing, bathtub-shaped failure rate called the Beta-Weibull-Logarithmic (BWL) distribution. Using the Sterling Polynomials, various properties of the new distribution such as its probability density function, its reliability and failure rate functions, quantiles and moments, R$acute{e}$nyi and Shannon entropie...

متن کامل

Simultaneous robust estimation of multi-response surfaces in the presence of outliers

A robust approach should be considered when estimating regression coefficients in multi-response problems. Many models are derived from the least squares method. Because the presence of outlier data is unavoidable in most real cases and because the least squares method is sensitive to these types of points, robust regression approaches appear to be a more reliable and suitable method for addres...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Statistics and Computing

دوره 17  شماره 

صفحات  -

تاریخ انتشار 2007